ranking policy
ProductRankingforRevenueMaximizationwith MultiplePurchases
Online retailing has become increasingly popular over the last decades [17, 28, 52]. The way of product ranking is the crux for online retailers because it determines the consumers' shopping behaviors [17] and thus influences the retailers' revenue [20, 49]. For instance, the probability of consumers' purchasing from a firm or clicking an advertisement is strongly related to the display order[8,3,33].
Supplementary Material Fairness in Ranking under Uncertainty A Related Work
The group fairness perspective imposes constraints like demographic parity (Calders et al., 2009; Zliobaite, 2015) and equalized odds (Hardt et al., 2016). Although similar in spirit, our work sidesteps this need to define a similarity metric between agents in the feature space. Rather, we view an agent's Ranking has been widely studied in the field of Information Retrieval (IR), mostly in the context of optimizing user utility. The Probability Ranking Principle (PRP) (Robertson, 1977), a guiding principle for ranking in IR, states that user utility is optimal when documents (i.e., the agents) are Besides ranking diversity, IR methods have dealt with uncertainty in relevance that comes via users' implicit or explicit feedback (Penha and Hauff, 2021; Soufiani et al., 2012), as well as stochasticity arising Kearns et al. (2017) present a way to fairly select Hence, they propose using the true CDF rank as a derived merit criterion that can be compared. Thus, a fair principal stands to gain more by obtaining perfect information.
Off-Policy Evaluation of Ranking Policies via Embedding-Space User Behavior Modeling
Takahashi, Tatsuki, Maru, Chihiro, Shoji, Hiroko
Off-policy evaluation (OPE) in ranking settings with large ranking action spaces, which stems from an increase in both the number of unique actions and length of the ranking, is essential for assessing new recommender policies using only logged bandit data from previous versions. To address the high variance issues associated with existing estimators, we introduce two new assumptions: no direct effect on rankings and user behavior model on ranking embedding spaces. We then propose the generalized marginalized inverse propensity score (GMIPS) estimator with statistically desirable properties compared to existing ones. Finally, we demonstrate that the GMIPS achieves the lowest MSE. Notably, among GMIPS variants, the marginalized reward interaction IPS (MRIPS) incorporates a doubly marginalized importance weight based on a cascade behavior assumption on ranking embeddings. MRIPS effectively balances the trade-off between bias and variance, even as the ranking action spaces increase and the above assumptions may not hold, as evidenced by our experiments.
Policy Learning for Fairness in Ranking
Conventional Learning-to-Rank (LTR) methods optimize the utility of the rankings to the users, but they are oblivious to their impact on the ranked items. However, there has been a growing understanding that the latter is important to consider for a wide range of ranking applications (e.g. To address this need, we propose a general LTR framework that can optimize a wide range of utility metrics (e.g. This framework expands the class of learnable ranking functions to stochastic ranking policies, which provides a language for rigorously expressing fairness specifications. Furthermore, we provide a new LTR algorithm called Fair-PG-Rank for directly searching the space of fair ranking policies via a policy-gradient approach. Beyond the theoretical evidence in deriving the framework and the algorithm, we provide empirical results on simulated and real-world datasets verifying the effectiveness of the approach in individual and group-fairness settings.